uep239 Final project: Where the Book Worms Go

A Suitability Analysis for where to live for the best reading, public access, and communities for books

By Kees Schipper

Suitability for bookworms

What are the conditions needed for an area to be an optimal living space for book worms. A few conditions of importance might be

Throughout this analysis, we are going to consider these factors in the form of six variables:

Below, we'll start by importing all of the libraries that we are going to use for this analysis.

Change current working directory to the data folder for easier reading of files

Part 1: Reading in our data

1a: Tabular Data

Let's also look at some basic descriptions of our data to see what we're dealing with

Interesting...it seems like pctInCollege is not a numeric object. Let's explore why that is

In the unique values, we see that there is one entry that is a '-' instead of a number, which is preventing python from interpreting this series as a float. Let's remove that string value and change it to 0.0, then we can convert the series to a float.

Part 1b: Reading in Shapefiles

See what the CRS values are for each of our shapefiles

We need to change the epsg for MPO_Bounds and ZCTA to match the other shapefile's CRS (EPSG:26986)

Let's try overlaying the maps to see that they all line up

All Checks out

It appears that our unit of measurement for EPSG:26986 is in meters. This will come in handy later when we reclassify our datasets

So far, it looks like everything lines up pretty nicely, but we should limit our study area to just the boston metropolitan area...We need to find which region in MPO_Bounds corresponds to Boston, and then clip all of our shapefiles to that region.

Now we need to clip all of our other data (ZCTA, Libs, mbta_node) to the extent of Bos_Bounds. We can do this using the gpd.clip() function

At this point in our analysis, we only really need the Bos_bounds vector data to clip other rasters. All the data for our vulnerability analyses are in our other data sets

1c: Read in Raster data

Part 2: Rasterizing Point Data

We can use the features.rasterize function to turn our MBTA train, bus, and library points into rasters. We can then calculate the distance from any point in our newly made rasters to our landmark of interest. This can then be used for scoring our data, where any point beyond a certain distance gets a specific score from 1-5.

Above is an example of how the distances look from our values of interest. Libraries look a lot more spread out than train stops, though bus stops look like the most densely clustered point data.

Let's create a 5-level reclassification structure for both MBTA distance and Library distances

As you can see by the figure above, all of our data are on the same scale, and could be added together to form a weight raster. However, I still want to look at a couple variables, such as the population of zipcode areas over 18 years and the population of individuals enrolled in college/graduate school in a public university.

Currently, we have complete rasters on four of our 6 desired factors

  1. Proximity to public Libraries
  2. Proximity to MBTA nodes
  3. Proximity to MBTA bus stops
  4. Different levels of tree canopy

Now, we need to combine tabular data from our read-in csv files, and join them to a GeoDataFrame

Part 3: Incorporating tabular data

Now we need to join our tabular data by attribute to a reference shapefile. To do this, we can first combine the age, sex, and education data into one tabular data frame, and then combine that data frame to our clipped ZCTA data via the ZCTA5CE10 variable, keeping only rows where the key is already in the ZCTA_clip data frame. That way we will only get data in the Boston study area.

After the merge, we should look to visualize our data to see the spatial distribution of our ACS demographic variables

Part 3b: create a function that takes vector data, and reclassifies a column into 5 quantiles.

Above, we can see our 5-level reclassification of ZCTAs, where 5 represents a "good" score for bookworms, whereas 1 represents a "bad" score for bookworms. Areas where a high percentage of individuals are enrolled in college may have a large number of people from whom one could borrow books from. On the other hand, sparsely populated areas might have high percentages of college enrolled students, but that may not add up to a really large number of students who you could borrow books from. In fact, it looks like low percentages of individuals enrolled in colleges is more predictive of total population size rather than number of students in college, even though there is likely a higher density of colleges within cities. On the other hand, the second graph shows areas with the highest absolute concentration of college-enrolled population, which is closer to downtown Boston, Newton, Brookline, and northern suburbs like Malden and Sommerville. These areas likely have a large number of students from whom one could borrow books (if you make the right connections). The last map, percent over 18, shows areas where again, readers could go to make connections to borrow books. The older someone is, the more time that they may have had to accumulate books in their life. Therefore, high percentages of population over 18 shows the viable percentage of population who you could borrow books from.

Part 4: zonal statistics and converting rasters to polygons

Let's start by making a function so that we can first see our raster with a boundary overlay, calculate zonal statistics, display the zonal statistics, and return a polygon layer

The four raster layers we have are named the following:

And our boundary layer is named ZCTA_AgeEdu

Now we can combine all of our scores into one attribute. I'll create an unweighted and a weighted score to see how much of a difference there is. The variables of interest that we have are:

I'm going to call my variable for unweighted suitability unw_suitability and my variable for weighted suitability w_suitability

The weighting scheme for w_suitability is justified as follows. Population variables were given values of 10% weight, as lending of books from complete strangers is uncommon, though it might be easy to get books from college students from Facebook marketplace or some ecommerce websites. Canopy was given a 10% influence, as though it's nice to have outdoor reading spaces with plenty of shade, the truly committed reader doesn't need an ideal environment to enjoy a good book (though it definitely helps). Distance to public transportation locations was given a weight of 20% each, as these might be necessary to access larger libraries like the Boston Public Library that might be farther away. In addition, it's nice to not have to lug a ton of books home if you have access to a bus. Finally, distance to a library was given the highest weight, with 30%, as it's nice to have a free library nearby to have access to books. Even if you don't have money to spend on books, you can still check books out if you can get access to a free library card. Libraries are also excellent reading areas and places to hang out, giving distance to libraries the highest weight of the final score.

Finally, let's write our output dataset to a file so that we can access it in the future

Now we can report the highest and lowest suitability scores and their respective zipcodes

Conclusions:

It appears that in the unweighted analysis, zipcode 01451 corresponded to the least suitable area for bookworms to relocate, while in the weighted analysis, zipcode 02762 was the lowest scoring zipcode. These zipcodes correspond to the areas of Harvard, Lancaster, and Devens, Massachusetts, and Plainville and Foxborough, Massachusetts respectively. In both analyses, the highest scoring zipcode was ZCTA 02115, which corresponds to an area in Boston containing Northeastern University, the Ruggles train stop, the Isabella Stewart Gardner Museum, Emmanuel College, Brigham and Women's Hospital, and much of the longwood medical area. The Prudential center is also just north of this zipcode area, and there is a cafe called Trident Booksellers and Cafe in the area. This area is also right down the street from the Boston Public Library, by about half a mile.